Method of Renaming Registers in Register File and Microprocessor Thereof

ABSTRACT

A microprocessor for processing instructions comprises multiple clusters for receiving the instructions, each of the clusters having a plurality of functional units for executing the instructions, multiple register sub-files each having multiple registers for storing data for executing the instructions, wherein each of the clusters is associated with corresponding one of the register sub-files so that an instruction dispatched to a cluster is executed by accessing registers in a register sub-file associated with the cluster to which the instruction is dispatched, a register-renaming unit for renaming target registers in an instruction with registers in a register sub-file associated with a cluster to which the instruction is dispatched, and issue-queue units each of which is associated with a corresponding one of the clusters, wherein an issue-queue unit holds instruction renamed by the register-renaming unit until the renamed instruction is issued to be executed in a cluster associated with the issue-queue unit.

CROSS REFERENCE TO RELATED UNITED STATES APPLICATIONS

This application is a continuation of, and claims priority from, U.S.patent application Ser. No. 11/511,677, filed on Aug. 29, 2006, which isa continuation of, and claims priority from, U.S. patent applicationSer. No. 10/087,880, filed on Mar. 4, 2004 of Mayan Moudgill, thecontents of which are incorporated herein in their entirety.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates to functional units and registers toprocess data in a microprocessor, and more particularly, to amicroprocessor with clusters and register files which are associatedwith each other to enhance the efficiency of data process therein.

2. Description of the Related Art

A microprocessor in an electronic system generally contains multiplefunctional units and multiple registers for the use of data processtherein. Each functional unit executes instructions to write data intopertinent register(s) in a register file. Functional units may be anydata computation units such as an arithmetic logic unit (ALU), an adderunit, a floating point unit, a load store unit, etc.

Since functional units in a microprocessor dispatch data to a registerfile in the same cycle, a register file should have the same number ofwrite ports as that of the functional units to satisfy the “peak datawrite requirement”, in which all the functional units generate data tobe written into a register file in the same cycle. Thus, as the numberof functional units in a microprocessor is increased, the number ofwrite ports of a register file should be increased to satisfy the peakdata write requirement.

Increase in the number of ports in a register file causes increase inthe area required to implement the register file and also in the timerequired to access data in the register file. For example, in a datawrite mode, the number of write ports in a register file determines thenumber of data values (or, the amount of data) that can besimultaneously written into the register file.

Referring to FIG. 1, there is provided a block diagram illustrating aregister file and functional units in a typical microprocessor. Themicroprocessor 10 may have “n” functional units FU₁-FU_(n) each of whichcan simultaneously produce data every cycle. In this case, to satisfythe peak data write requirement, the microprocessor 10 should have aregister file 12 with the same number of write ports WP₁-WP_(n) as thatof the functional units FU₁-FU_(n), i.e., “n” write ports.

In case that it is required for a microprocessor to have more functionalunits, it is also required to increase the number of write ports of aregister file in the microprocessor. Such an increase in the number ofwrite ports affects size and speed of the microprocessor.

To overcome such problems in the conventional microprocessors, aregister file in a microprocessor is designed to have fewer number ofwrite ports than the number of functional units. In such processors, itis necessary to arbitrate the functional units for the write ports ofthe register file. In other words, an arbitration unit is required tomanage data communication between the functional units and the writeports of a register file.

In an arbitration process, a functional unit should first send a requestsignal to an arbitration unit to write data into a register file. Thearbitration unit receives all request signals from functional units andthen grants certain functional units access to the write ports inaccordance with an arbitration logic. Then, the functional units ofwhich requests have been granted may proceed to write data into aregister file, and other functional units of which requests have notbeen granted should request the access in the next cycle.

In a microprocessor adopting the arbitration technique, since eachfunctional unit should send an access request and wait for the grant, itcauses additional delay in data process of the microprocessor. Forexample, a cycle time for the microprocessor may be increased by a timeperiod required for the arbitration process. Also, the arbitrationprocess may affect performance of the microprocessor by forcing thefunctional units stall if there is no write port free.

Another example of a conventional approach in this area can be found in“The Multi-cluster Architecture: Reducing Cycle Time ThroughPartitioning” by K. I. Frakas et al., pp. 149-159, MICRO-30, December1997. In this reference, architected registers are partitioned for thepurpose of decoupling clusters and reducing read and write ports of aregister file. In this technique, data read and write operation can beperformed only between particular register files and functional unitsassociated with each other. This technique is described below withreference to FIG. 2.

In FIG. 2, the first and second functional units FU₁, FU₂ are associatedwith the first and second register files RF₁, RF₂, respectively. Thefirst register file RF₁ has architected registers r₀-r₁₅, and the secondregister file RF₂ has architected registers r₁₆-r₃₁. The firstfunctional unit FU₁ has efficient access to the architected registersr₀-r₁₅ in the first register file RF₁, and the second functional unitFU₂ has efficient access to the architected registers r₁₆-r₃₁ in thesecond register file RF₂. For example, the efficient access may beaccomplished when instruction “r₇←r₁₁+r₁₂” is dispatched to the firstfunctional unit FU₁, and instruction “r₁₇←r₂₃+r₃₁” is dispatched to thesecond functional unit FU₂.

However, this technique has drawbacks in case of instructions such asinstruction “r₇←r₁₁+r₃₁” which is dispatched to the first functionalunit FU₁. In this case, to obtain the contents of the architectedregister r₃₁, the first functional unit FU₁ should have access to thesecond register file RF₂. The access path between the first functionalunit FU₁ and the second register file RF₂ is so slow that performance ofthe microprocessor may be severely retarded.

Another problem in the microprocessor in FIG. 2 is that computation ofthe microprocessor may be distributed unevenly. In other words, if theprogram being executed in the microprocessor uses mostly architectedregisters r₀-r₁₅ of the first register file RF₁, the computation for theprogram is not evenly distributed and the registers r₁₆-r₃₁ in thesecond register file RF₂ are not utilized.

Therefore, a need exists for a microprocessor having less number ofwrite ports in a register file than the number of functional units,while having no problems such as performance delay or degradation causedby the arbitration process, data access through the slow paths, theuneven distribution of computation, etc.

OBJECTS AND SUMMARY OF THE INVENTION

It is an object of the present invention to provide a microprocessorhaving less number of write ports in a register file than the number offunctional units in the microprocessor.

It is another object of the present invention to provide a method ofdesigning a microprocessor with register files and functional unitswhich satisfy the “peak data write requirement”, while the registerfiles have less number of write ports than the number of functionalunits.

To accomplish the above and other objects of the present invention,there is provided a microprocessor for processing instructions,comprising a plurality of clusters for receiving the instructions, eachof the clusters having a plurality of functional units for executing theinstructions; and a plurality of register sub-files each having aplurality of registers for storing data for executing the instructions,wherein each of the clusters is associated with corresponding one of theregister sub-files so that an instruction dispatched to a cluster isexecuted by accessing registers in a register sub-file associated withthe cluster to which the instruction is dispatched. Each of the registersub-files preferably has one write port to which a corresponding clustersends data to be written into registers in a register sub-fileassociated with the corresponding cluster, and the register sub-fileseach have a same number of registers.

The microprocessor may also include a register-renaming unit forrenaming target registers in an instruction with registers in a registersub-file associated with a cluster to which the instruction isdispatched. The register-renaming unit identifies a register to be usedto store a value named by a target register in the instruction. Themicroprocessor may also include issue-queue units each of which isassociated with a corresponding one of the clusters and an instructiondispatch mechanism for determining which of the clusters eachinstruction is dispatched to. An issue-queue unit holds instructionrenamed by the register-renaming unit until the renamed instruction isissued to be executed in a cluster associated with the issue-queue unit,and the instruction dispatch mechanism controls the issue-queue units todetermine which of the instructions need to be executed.

In another aspect of the present invention, a system is provided forprocessing an instruction in a microprocessor. The system comprises atleast one cluster having at least one functional unit for executing theinstruction; and at least one register file having a predeterminednumber of physical registers to and from which data is write and read inaccordance with the instruction, wherein the at least one register filehas one write port to which an output of the at least one cluster isconnected, and data write operation in accordance with the instructionexecuted by the at least one functional unit is performed by accessingthe physical registers of the at least one register file.

The system may also include means for renaming architected registers ofthe instruction with the physical registers of the at least one registerfile, and at least one issue-queue unit associated with the at least onecluster, for holding instruction renamed by the means for renaming untilthe instruction is issued to be executed in the at least one cluster.

In another aspect of the present invention, a method is provided forprocessing instructions in a microprocessor. The method comprises thesteps of providing clusters each having functional units for executingthe instructions; dividing a register file into a plurality of registersub-files each having registers to store data for executing theinstructions; associating each of the register sub-files withcorresponding one of the clusters; selecting a cluster to which aninstruction is dispatched; renaming target registers in the instructionwith registers in a register sub-file associated with the selectedcluster; and dispatching the instruction to the selected cluster whereinthe instruction is executed by functional units. The dividing step mayalso include assigning a same number of registers to each of theregister sub-files. The associating step may include providing one writeport for each of the register sub-files so that a cluster associatedwith a register sub-file sends data to be written to a write port of theregister sub-file. The renaming step may include identifying a registerin a register sub-file to be used to store value named by a targetregister in the instruction.

These and other objects, features and advantages of the presentinvention will become apparent from the following detailed descriptionof illustrative embodiments thereof, which is to be read in connectionwith the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram illustrating a register file and functionalunits in a conventional microprocessor;

FIG. 2 is a block diagram illustrating register files and functionalunits in another conventional microprocessor;

FIG. 3 is a block diagram illustrating register sub-files and clustersin a microprocessor according to a preferred embodiment of the presentinvention;

FIG. 4 is a block diagram for illustrating a microprocessor according toanother embodiment of the present invention; and

FIG. 5 is a flow chart for describing operation of the microprocessor inFIG. 4.

DESCRIPTION OF PREFERRED EMBODIMENTS

Detailed illustrative embodiments of the present invention are disclosedherein. However, specific structural and functional details disclosedherein are merely representative for purposes of describing preferredembodiments of the present invention.

Referring to FIG. 3, a block diagram is provided for illustrating amicroprocessor according to a preferred embodiment of the presentinvention. In the microprocessor 30, a register file is divided intomultiple register sub-files RSF₀-RSF_(n). Each of the register sub-filesRSF₀-RSF_(n) includes a set of physical registers (refer to FIG. 4).Preferably, the register sub-files RSF₀-RSF_(n) each have the same size,i.e., the same number of physical registers, and have one write portWP₀-WP_(n), respectively, through which data is written into registersin a corresponding register sub-file. Each register sub-file also has atleast one read port RP₀-RP_(n) through which data is read from registersin a corresponding register sub-file.

The microprocessor 30 also has multiple clusters CL₀-CL_(n) each ofwhich includes a set of functional units. Each register sub-file isassociated with corresponding one of the clusters. In the microprocessor30, the clusters CL₀-CL_(n) are functionally and/or structurallyassociated with the register sub-files RSF₀-RSF_(n), respectively.

In this embodiment, a cluster sends data only to a register sub-fileassociated with the cluster in a data write operation, while a clustercan read data from any of the register sub-files RSF₀-RSF_(n) in a dataread operation. For example, when a write instruction is dispatched tocluster CL₀ to be executed by the functional unit(s) therein, onlyregister(s) in register sub-file RSF₀ associated with the cluster CL₀may be accessed to write data therein. Thus, it is not necessary foreach register sub-file to support write instructions issued from all theclusters CL₀-CL_(n). Instead, each register sub-file only needs tosupport write instructions from the functional units within a clusterassociated with the register sub-file.

Referring to FIG. 4, it is assumed for a convenience of the descriptionthat a microprocessor 40 has two (2) clusters CL₀, CL₁ and two (2)register sub-files RSF₀, RSF₁. The first and second clusters CL₀, CL₁are functionally and structurally associated with the first and secondregister sub-files RSF₀, RSF₁, respectively. The first and secondclusters CL₀, CL₁ each have multiple functional units each generatingone output result per cycle. The first and second register sub-filesRSF₀, RSF₁ each have multiple physical registers. For example, the firstregister sub-file RSF₀ has physical registers R₀-R₃₉, and the secondregister sub-file RSF₁ has physical registers R₄₀-R₇₉.

The first and second register sub-files RSF₀, RSF₁ also have write portsWP₀, WP₁, respectively. Thus, the first cluster CL₀ (or functional unitsin the first cluster) accesses the registers in the first registersub-file RSF₀ to write data in the registers therein, and the secondcluster CL₁ (or functional units in the second cluster) accesses theregisters in the second register sub-file RSF₁ to write data in theregisters therein. Since each cluster is associated with thecorresponding register sub-file, the microprocessor 40 with registersub-files each having only one write port satisfies the peak data writerequirement in a data write operation.

In the microprocessor 40, a register-renaming unit 42 is also providedfor performing register-renaming process with respect to instructions tobe transferred to the clusters CL₀, CL₁ which are then executed by thefunctional units therein. It should be noted that the register-renamingunit 42 may be configured outside the microprocessor 40, and that theregister-renaming process may be implemented by use of software programwithout any separate hardware structure.

In the register-renaming unit 42, architected registers in aninstruction are mapped into physical registers in the register sub-filesRSF₀, RSF₁. Architected registers are used to identify values associatedwith computation of a microprocessor. For example, in instructions“r₃←add r₇, r₉” and “r₃←mul r₃, r₂”, register r₃ is an architectedregister. Register r₃ first contains the result of the addition which isthen used as an input to the multiply. The result of the multiply isthen stored in register r₃. Generally, there are fixed number ofarchitected registers for a particular instruction set architecture(ISA). For example, the PowerPC ISA has thirty-two (32) general purposearchitected registers.

Physical registers in the register sub-files RSF₀, RSF₁ are hardwarerealization of the architected registers. For a microprocessor, therecan be more physical registers than architected registers. Thus, valuesnamed by a specific architected register may reside in differentphysical registers. For example, in the above instructions “r₃←add r₇,r₉” and “r₃←mul r₃, r₂”, the result of the addition may be placed inphysical register R₅₄. Then, when the multiply is executed, the physicalregister R₅₄ is read to obtain its content and the result of themultiply may be placed in physical register R₂₀.

In a register-renaming process, each architected register is mapped intocorresponding one of the physical registers. In the above example,architected register r₃ may be mapped into physical register R₅₄ or R₂₀.

Preferably, in the register-renaming process, target registers in aninstruction are renamed with physical registers in the registersub-files RSF₀, RSF₁. In other words, the renaming is to identify aphysical register in a register sub-file that will be used to storevalue named by a target register in an instruction. A target register isan architected register in an instruction that will be provided with aresult of the instruction. For example, in the instruction “r₃←add r₇,r₉”, register r₃ is a target register.

Prior to the register-renaming process, it is necessary to determinewhich of the clusters each instruction is dispatched to. Suchdetermination may be performed in a instruction dispatch mechanism 44.Once an instruction is determined to be dispatched to a particularcluster, target registers in the instruction are renamed with physicalregisters in a register sub-file which is functionally associated withthe particular cluster. For example, when an instruction is determinedto be dispatched to the first cluster CL₀ to be executed by thefunctional units therein, target registers of the instruction arerenamed with the physical registers in the first register sub-file RSF₀,i.e., registers R₀-R₃₉.

The microprocessor 40 may also include issue-queue units 46, 48 whichare functionally associated with the register sub-files RSF₀, RSF₁,respectively. The issue-queue units 46, 48 hold the state identifyingwhich of the instructions needs to be executed. Thus, in the issue-queueunits, register-renamed instructions (i.e., instructions after theregister-renaming process) are held until they are issued to be executedby functional units in an appropriate register sub-file. The instructiondispatch mechanism 44 also determines which of the issue-queue unitseach instruction is transferred to.

In FIG. 5, a flow chart is provided for describing the method ofregister-renaming according to the present invention. In amicroprocessor with a register file having multiple physical registers,the register file is divided into multiple register sub-files (step 51).As a result, each of the register sub-files has a predetermined numberof physical registers and preferably one write port. The physicalregisters may be grouped evenly so that the register sub-files each havethe same number of physical registers.

Each of the register sub-files is associated with a particular clusterhaving multiple functional units for executing instructions (step 53). Aregister sub-file is functionally associated with a correspondingcluster so that instructions dispatched to the cluster are supported byphysical registers in the register sub-file associated with thecorresponding cluster. Then, it is determined which of the clusters eachinstruction is dispatched to (step 55). Each instruction is dispatchedto a selected cluster to be executed by functional units in thatcluster.

The register-renaming process is performed with respect to theinstructions, where architected registers (preferably, target registers)in an instruction are renamed with physical registers in the registersub-files (step 57). For example, when an instruction is determined tobe dispatched to a cluster, target registers in the instruction arerenamed with physical registers in a register sub-file associated withthe cluster.

In consummation of the register-renaming process, each instruction isdispatched to a corresponding cluster determined in step 55 (step 59).Thus, the instruction is executed by functional units in the cluster.For the execution of the instruction, only the physical registers in aregister sub-file associated with the cluster are accessed to store datafrom the cluster.

Having described preferred embodiments of a system and method ofregister-renaming in a microprocessor according to the presentinvention, modifications and variations can be readily made by thoseskilled in the art in light of the above teachings. It is therefore tobe understood that, within the scope of the appended claims, the presentinvention can be practiced in a manner other than as specificallydescribed herein.

1. A system for processing an instruction in a microprocessor,comprising: a plurality of clusters having at least one functional unitfor executing the instruction; and a plurality of register files havinga predetermined number of physical registers to and from which data iswrite and read in accordance with the instruction, wherein each of theregister files has one write port to which an output of a correspondingcluster is connected, and data write operation in accordance with theinstruction executed by the at least one functional unit is performed byaccessing the physical registers of the at least one register file, andwherein each of the plurality of register files has at least one readport from which any of the plurality of clusters can read data.
 2. Thesystem of claim 1, wherein each of the plurality of clusters includesmultiple functional units each for executing different instructions. 3.The system of claim 1, further including means for renaming architectedregisters of the instruction with the physical registers of each of theplurality of register files.
 4. The system of claim 3, wherein thearchitected registers are target registers in which a result of theinstruction is stored.
 5. The system of claim 3, further including atleast one issue-queue unit associated with the plurality of clusters,for holding instruction renamed by the means for renaming until theinstruction is issued to be executed in one of the plurality ofclusters.